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Abstract 

o 

■ In this paper we prove the separation of source-network coding and channel coding in wireline networks. For the 

purposes of this work, a wireline network is any network of independent, memoryless, point-to-point, finite-alphabet 
channels used to transmit dependent sources either losslessly or subject to a distortion constraint. In deriving this 
result, we also prove that in a general memoryless network with dependent sources, lossless and zero-distortion 
reconstruction are equivalent provided that the conditional entropy of each source given the other sources is non- 
zero. Furthermore, we extend the separation result to the case of continuous-alphabet, point-to-point channels such 
as additive white Gaussian noise (AWGN) channels. 

-H ■ 

o . 

I. Introduction 

In his seminal work [I], Shannon separates the problem of communicating a memoryless source across a single 

> 

' noisy, memoryless channel into separate lossless source coding and channel coding problems. The corresponding 

ly-j , result for lossy coding in point-to-point channels, which is proven in the same work, is almost immediate since 

lossy coding in a point-to-point channel is equivalent to lossless coding of the codeword indices. For a single point- 
to-point channel, separation holds under a wide variety of source and channel distributions (see, for example, Q 
and the references therein). Unfortunately, separation does not necessarily hold in network systems. Even in very 
small networks like the multiple access channel |l3l, separation can fail when statistical dependencies between the 

•rH , 

. sources at different network locations are useful for increasing the rate across the channel. Since source codes tend 

' to destroy such dependencies, joint source-channel codes can achieve better performance than separate source and 

channel codes in these scenarios. 

This paper proves the separation between source-network coding and channel coding in networks of independent 
noisy, discrete, memoryless channels (DMC). Roughly, we show that the vector of achievable distortions in delivering 
a family of dependent sources across such a network M equals the vector of achievable distortions for delivering 
the same sources across a distinct network J\f. Network J\f is built by replacing each channel p{y\x) in A/" by a 
noiseless, point-to-point bit-pipe of the corresponding capacity C — m&Xp^^^-j I{X;Y). Thus a code that applies 
source-network coding across links that are made almost lossless through the application of independent channel 
coding across each link asymptotically achieves the optimal performance across the network as a whole. 

Note that the operations of network source coding and network coding are not separable, as shown in JH and 
for lossless source coding in non-multicast and multicast networks, respectively. As a result, a joint network-source 
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code is required, and only the channel code can be separated. While the achievability of a separated strategy is 
straightforward, the converse is more difficult since preserving statistical dependence between codewords transmitted 
across distinct edges of a network of noisy links improves the end-to-end network performance in some networks ||6l. 

The results derived here are consistent with those of Q, IH and 161, which prove the separation between network 
coding and channel coding for multicast Q, jS] and general demands ||6l, respectively, under the assumption that 
messages transmitted to different subsets of users are independent. The shift here is from independent sources to 
dependent sources and from reliable information delivery to both lossy and lossless data descriptions. 

After hearing about our work, the author of ||9] pointed us to his unpubUshed work from the 90s, which proves 
the separation of lossy network source coding and channel coding in three specific network structures, namely, 
the multi-terminal source coding network, the multiple description network, and Yamamoto's cascade network. 
In these cases, ||9l proves separation without requiring the single-letter characterizations of the distortion regions. 
Our result generalizes this result to any network configuration that consists of point-to-point noisy channels. The 
strategy underlying our proof follows that of |6|. The details differ significantly, however, both due to the inclusion 
of dependent sources and lossy reconstruction and in the focus on discrete-alphabet channels. 

The organization of this paper is as follows. Sections and |lll] describe the notation and problem set-up, 
respectively. Section |IV] describes a tool from ||6| called a stacked network that allows us to employ, in later 
arguments, typicality across copies of a network rather than typicality across time. Section |V] proves the separation 
of lossy source-network coding and channel coding. Section|Vl]proves the equivalence of zero-distortion and lossless 
reconstruction in general memoryless channels. Section IVIII shows that the separation of source-network coding 
and channel coding continues to hold for well-behaved continuous channels such as AWGN channels. Section [Villi 
concludes the paper. 

The first part of the results presented in this paper, showing the separation of lossy source-network coding and 
channel coding in a wireline network of finite-alphabet DMCs with dependent sources was first presented at ISIT 
2010 [10|. A similar result by other authors was presented at the same ISIT ifTTl . 

II. Notation and definitions 

Finite sets are denoted by script letters such as X and 3^. The size of a finite set A is denoted by |^|. Random 
variables are denoted by upper case letters such as X and Y . Bold face letters represent vectors. The alphabet 
of a random variable X is denoted by X. Random vectors are represented by upper case bold letters like X 
and Y. The length of a vector is implied in the context. The element of a vector X is denoted by X^. A 
vector X = (xi, . . . ,a;„) or X = (Xi, . . . is sometimes represented as a;" or X". For 1 < i < j < n, 

xl = (a;^, . . . , Xj). For a set ^ C {1, 2, ... , n}, ~ {xi)ieA^ where the elements are sorted in ascending 
order of their indices. 

For two vectors x, y £ R'', x < y iff Xi < yi for all 1 < i < r. The ii distance between two vectors x and 
y of the same length r is denoted by ||x — y||i = ~ Vi\- ^ y represent pmfs, i.e., X]i=i — 
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S[=i ?/i = 1 Xi, Hi > for all i G {1, . . . , r}, then the total variation distance between x and y is defined as 
l|x - yllxv = 0.5||x - y||i. 
Definition 1: The empirical distribution of a sequence a;" € is defined as 

n 

for all X ^ X. Similarly, the joint empirical distribution of a sequence (x",?/") £ x J^" is defined as 

n 

for all {x, y) ^ X y.y. 

Definition 2: For a random variable X ~ p{x) and a constant e > 0, the set Tt''^\x) of e-typical sequencej^ of 
length n is defined as 

7-W(x) ^ {x" : |7r(a;|a;") < ep{x) for all x € A"}. 

For [X, Y) ^ p{x, y), the set Te^^\x^ Y) of jointly e-typical sequences is defined as 

r,(")(^, y) = {(^", y") : k(^, y") - y)\ < ep^x, y), for all (x, G A- x j;}. 

We shall use T^*""' instead of Tt"'\x) or Te^^\x, Y) when the random variable(s) are clear from the context. 
For x" e r/"\ let 

III. The problem setup 

Consider a multiterminal network J\f consisting of m nodes interconnected via a collection of point-to-point, 
independent DMCs. The network structure is represented by a directed graph G with node set V and edge 
set £. Each directed edge e ~ (a, 6) G £ represents an independent point-to-point DMC {Xe,p{ye\xe),ye) 
between nodes a (input) and b (output). For the channel represented by the edge e, the transition probabilities 
are {p{ye\xe)}{xc,ye)eXcxyc- The channels are independent by assumption, together giving a multiterminal channel 
(neG£'^e'nee£^'(ye|a;e),nee£^e)- ^he channel input at each node a G V is x'-'^^ = {x{a.v) ■ {a,v) G £). The 
channel output at node a is y^"' = {y(v,a) '■ (v, a) G £). 

Each node a observes some source process U*-"-' = {t^fe°''}fc°=i ^nd is interested in reconstructing the processes 
observed by a subset of the other nodes. The alphabet Z//'"' of source U'"^ can be either scalar- or vector-valued. 
A vector-valued source U'^"' denotes a collection of sources available at node a. In a block coding framework, 
source output symbols are divided into non-overlapping blocks of length L. Each block is described separately. 
At the beginning of the j"^ coding period, each node a observes a length-L block of the process XJ^°-\ i.e., 

'in this paper we only consider strong typicality, and use the definition introduced in 1121 . 
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^('"i'l^L+i = ('^(j"-i)L+i' • ■ • ' ^ji!)- blocks {U^j^(^j^^-^}aev observed at the nodes a e V are described over 
n uses of the network. The rate k = ^ is a parameter of the code. At each time t <E {1, . . . ,n}, each node a 
generates its next channel inputs as a function of its source observation J/^jl'fj^^^ observed channel outputs 

y(a),t-i ^ ^yI''\ y}^\) up to time t - 1 using encoder 

Xi^) : {y^'^^f-^ X Z^f")'^ X^'^l (1) 

Note that each node might have more than one incoming channel and more than one outgoing channel. Thus, 
Xj"'' and f/"^ are vectors with dimensions equal to the outdegree and indegree of node a, respectively. The 
reconstruction at node b of the source vector observed at node a is denoted by Iji"-^'')'^, This reconstruction 
is determined using a decoder with inputs equal to the source and channel outputs observed at node b. Thus, 

jj{a^b),L _ jj(,a'^b)(Yib)^n^jj{b),L-^^ whcrC 

The performance of a given code is the vector of expected average distortions between the sources {U'-"-'}aev 
and reconstructions {\J^°-'^'''>}a,b^v- For each a, 6 e V, 

L 



L 

fc=i 



where S°-^^^ : U^°'^ x U^°-~^''^ ]R+ is a per-letter distortion measure, and d^"^^^ (x, x) = if and only if x = i. 
As mentioned before U^°'^ and 11^°-^^^ may be either scalar or vector-valued. This allows the case where node a 
observes multiple sources and node 6 is interested in reconstructing a subset of them. Let 

<ax= max d(''-^^)(a,/3) <oo. 

The |V| X I V| distortion matrix D is said to be achievable at rate k in network M if for any e > 0, there exists 
a pair {L,n), L/n < k, and blocklength-(L, n) coding scheme such that 

Ei^"""^' (C/(°)-^, U^"^''^'^)] < D{a, b) + e, (3) 

for every a, 6 € V. Let 2?(k, A/") denote the set of achievable distortion matrices at rate n in network M. 

Throughout the paper, for any network M of noisy point-to-point channels described by directed graph G, let 
the network Mb denote a network of noiseless point-to-point channels described by the same directed graph G. 
Precisely, network Mb replaces each noisy DMC (A'e,p(ye|2^e), 3^e), e G by a noiseless bit pipe of the same 
finite capacity Cc = maxp(3.^) I{X(.\ Ye). A bit pipe of capacity Ce is an error-free, point-to-point communication 
channel that delivers, in n channel uses, [nCgJ bits from the transmitter to the receiver, for any n>\. The timing 
of the delivery of these bits has no impact on the set of achievable distortion matrices. This result is shown for the 
network capacity problem in |6l; the same argument goes through immediately for the case of lossy reconstruction. 
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IV. Stacked network 

The stacked network is a tool introduced in ||6l for proving separation results. The key underlying observation 
is that by taking multiple copies of the same network and applying the same code to that network in each copy, 
we create i.i.d. copies of the input and output of a given channel at each time t. This allows us to later employ 
typicality arguments to our channel inputs and outputs across copies of the network and not across time. Applying 
typicality arguments across time is problematic since the inputs to the channel at different times t need not be 
i.i.d. For a given network J\f, the corresponding A^-fold stacked network J\f_ is defined as N copies of the original 
network ||6l. That is, for each node a G V and each edge e G £ in Af, there are N copies of node a and n copies 
of edge e in J\f_. At each time instance, each node has access to the data available at all copies of node a, and each 
may use this extra information in generating the channel inputs for future time instances. Likewise, in decoding, all 

copies of a node can collaborate in reconstructing the source vectors. This is made more precise in the following 
two definitions 

: yia).Nit-l) ^ yia),NL ^ ;^(a),W^ 

and 

fj{a-^b).NL . y{b).nN ^ ^{b},NL _^ ^{a^b},NL^ 

which correspond to O and ^ in network TV. In ©, xj"^ (x|''^(l), . . . , x[°-\n)). 

In an A^-fold stacked network, the distortion between the source observed at node a and its reconstruction at 
node b is defined as 

for any (a, 5) e V x V. 

A distortion matrix D is said to be achievable in the stacked version of network Af at some rate k, if for any 
given e > 0, there exist N, n, and L large enough, such that L/n < k and DN{a, b) < D{a, h) + e, for all a, 5 e V. 
Let 'Ds{K,J\r) denote the set of achievable distortion matrices at rate k in the stacked network JV_. 

Note that the dimension of the distortion matrices in both single layer and multi-layer networks is m x to. The 
following theorem establishes the relationship between the two sets. 

Theorem 1: At any rate k, 

V{k,M)^V,{k,J£). (6) 

Proof of Theorem Q} 

i. T>{k,M) C T>s{K,J\r): Consider any D E int(2?(K, A/")). Then for any e > 0, there exists a code of rate 
n < L/n on M such that Q is satisfied. For any A^, a stacked network that uses this same coding strategy 
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independently in each layer achieves expected distortion 




)] 



1 

< — 

- N 



N 



Y.[D{a,b)+e] 



= D{a,b) + e. 

ii. T>s{k,JV_) C 'D{K,J\f): The proof is very similar to the proof of the analogous part of Theorem 1 in ||13i . and 
hence is omitted. 



In this section we assume that the sources are independent and identically distributed (i.i.d.) according to some 
distribution p{ui,U2, • ■ • , Ufc). That is. for any k > 1, 



For the given i.i.d. source assumption, Theorem|2]proves that the space of achievable distortions for networks TV 
and A/fc are identical. The proof follows the proof strategy of [6, Theorem 3], showing that any code for network 
Af ,, can be applied across network J\f with the aid of a channel code and any code for Af can be applied across 
network Aff^ with the aid of an "emulation code". Just as a channel code enables us to emulate a noiseless bit 
pipe across a noisy channel, an emulation code enables us to emulate a noisy channel across a noiseless bit pipe. 
The result proves the optimality of separate source-network codes and channel codes on networks of point-to-point 
DMCs. Notice, however, that separate codes are here applied in the manner described in the proof of Theorem [T| 
rather than the more conventional direct application across time. 

Theorem 2: For a network N of independent point-to-point DMCs with memoryless sources. 



for any k > 0. 

Proof of Theorem |2} By Theorem [T] the achievable region of a network N is equal to the achievable region 
of its stacked network M_. Hence, T>{K,J\f) — 'Ds{n,A[) and T){n,Mb) ~ T^si^^jAfj,), and therefore, it suffices to 
prove that 'Ds{k,JSL) = A/"b). 

i. A/|fc) C Vs{k,A[): Note that TV and TVi are identical except that for each e e £, DMC {Xe,p{ye\xe),ye) 

in TV is replaced by a bit pipe of capacity C = maxp(a:c) I{Xe] Ye) in Mb- We next show that any code for 



V. Replacing a noisy channel with a bit pipe 



k 



(7) 
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network Nj, can be operated on M_ with a similar expected distortion. Fix any code of source blocklength LN , 
channel blocklength nN and expected distortion matrix D for A^-fold stacked network Mj,. Now consider a 
pA^-fold stacked network Sfj,. By partitioning the pN layers into p stacks, each consisting of N layers, and 
then applying the code independently to these stacks, we can construct a code for network Mj^, which has 
the same expected distortion matrix D. Consider a pA/-fold stacked network M_, with M > N. Using the 
mentioned strategy to construct a code for the pN-fold stacked network from the code given for the A^-fold 
network, at each time step t, each bit pipe e in Aff^ sends a message of at most plNCe\ bits across the pN 
copies of edge e in M,),. To operate the same code on network Af^, we need to send the same information across 
the pM copies of DMC {Xe,piye\xe), 3^e) in A/|- To achieve this goal, we use a channel code of blocklength 
pAI operating at rate < Ce- By choosing M = \NCe/Re], we guarantee that pMRe > pNCe- Hence the 
M copies of DMC {Xe,p{ye\xe), 3^e) in Af_ carry the same information as the N copies of bit pipe e in A/"^. 
Since the capacity of DMC {X^, p{ye\xe) , ye) equals Ce, Re can be made arbitrarily close to Ce- The rate of 
the code for J\[_ is 

pNL _ NL 
pMn ~ jNCjRe]n' 

which can be made arbitrary close to k = L/n. 

Let Pe^^ denote the maximal probability of error for the channel code of blocklength pAI used over the pM 
copies of DMC {Xe,p{ye\xe),ye) in M.- Let PiJax = maxegg Pi'^''. The code for each channel e is used 
n times — once for each t e {1, . . . , n}. Errors in the channel code for e increase the distortion achieved 
by applying the code for J\f_fj across A/|. We can bound this increase in the expected average distortion using 
the union bound. More precisely, let TZ denote the event that there is a decoding error in at least one of the 
channels e E £ at some time step t E {1,2, . . . ,n}. Since the sources and channel codes are independent, 

+ E[dpWL(c/(")'P^^, ^7(°^^)'P^^)|7^] p(7^) 

<D{a,b) + n\£\P^ldn,.^, 

for each {a,b) e V^. Therefore, for fixed n and N, letting p ^ oo, \ E[dpNLiU^"^'P^^ ,U^''^''^'P^^)]-D{a,b)\ 
can be made arbitrarily small for each (a, b) E V^. 
ii. 'Ds{K,Ar) C Vs{K,JSf_h): Let D e V{k,M). We prove that D e 'Ds{K.,JSLh). Consider a code defined on TV 
that achieves distortion D + e ■ 1. Applying this code independently in each layer of A^-fold stacked network 
J\f gives a code for JV_ with D]^{a, b) < D{a, b) + e, for all a,b E V. We first show that, when all sources 
are memoryless and uniformly distributed, the performance of the code given the realization of (Xe.i, Ye.i) 
depends only on the empirical distribution {TT{xe,ye\^e.i,^e,i)}{x^ i.y^ i)ex^xy^ of (Xe,i,Ye,i). Here the 
subscript 1 refers to time t = 1. After establishing this, we use the result proved in lfT4ll and show that at time 
t = 1 we can emulate the behavior of the noisy link across a bit pipe of the same capacity. For the rest of 
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the proof, let U = {Ut} denote an i.i.d. source observed at some node in a G V and U = {Ut} denote its 
reconstruction at some other node b E V\{a}. 

In network J\f, the expected distortion between source vector and its reconstruction is 



E[di([/^,[/^)] = 



X P((Xe,l,ye,l) = {Xe,l,Ve,l)) ■ 



(8) 



In the iV-fold stacked network the reconstruction of the coiTesponding independent copies of by 
reproduction [/^^ satisfies 



E 



N 

E 



E 



"L ['-^ (e-i)L+i' '-^ (e-i)L+i 



N 



E 



HX,,i(f),X,,i(£)) = (a;,,i,ye,l) 



= E 



N 



E E 



dh (C^(^_l)L + l)C^(^_l)L + l) l(Xe,l(«),Ye,l(^)) = (2:e.l,ye,l) 



N 



E E 

(£Ce,l,ye,l)eA'.Xj^. 



dL([/^,;7^)|(Xe,l,i;4) = (a;e.l,J/e,l) 



(9) 



where each conditional expectation in (|9]l equals the corresponding conditional expectation in ^ since the 
code used on M_ applies the solution for M independently in each layer of stacked network M_. Equations ([8]) 
and ^ differ only in their distributions on A'e xy^- Since each conditional expectation is finite (in particular, 
all are bounded by dmax), we can replace channel (A'e,p(2/e|a:e), 3^e) by a bit pipe of capacity Cg at time 
i = 1, if we can find a coding scheme across the layers of the stack for which. 



|P((Xe4,Yea) = [Xe,l,ye,l)) - E[7r(a;e,i , l/e^ |Xe_i , Ye,i)] 



(10) 



can be made arbitrary small, for all (xg,!, ye,i) G <^e x 3^e- 

To prove that this is possible, consider a channel with input drawn i.i.d. from some distribution p{xe.i)- We 
wish to build an emulation code with an encoder that maps N source symbols, Xg i G , to a message 
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of NR bits and a decoder that converts these NR bits into a reconstruction block Yd G . We aim 
to use this code to emulate the DMC with transition probabilities {p{ye,i\xe.i)}{xc i.y^ i)ex^xye when the 
channel input is an i.i.d. process drawn according to p{xe.i)- The codebook, C'^\ of this emulation code 
consists of codewords, {Yg Ye.i[2], . . . , Ye.i[2^^]}, each drawn independently i.i.d. according to 
piUe.i) — J2x lex P(2^e,i)p(?/e,i|a;e,i)- The encoder assigns message M e {1,...,2^^} to input sequence 
Xe,i, if (Xe,i, Ye,i[A/]) G 7^^^^ (Xe,i , Fg, 1 ) . If there are multiple such messages in the codebook, the encoder 
chooses the one with the smallest index. If there exist no codewords in C*-^-* that are jointly typical with 
Xe 1, then the encoder assigns message 7\/ = 1 to Xg i. After receiving message M, the decoder outputs 
Ye,i[A/]. Let {7r(xe,i, j/e.i |Xe,i, Ye,i)}(2:^ ilGA'^xj^e be the the joint empirical distribution between the 
channel input and channel output induced by running the emulation code across the N copies of the bit pipe 
at time i = 1. In [TJl, it is shown that, the described code can emulate channel p(ye, i|a:e, i), 3^e) by 
a bit pipe of rate R, provided that R > /(Xe i;Ye,i)- The given emulation ensures that the total variation 
between 7r(a;e,i, ?/e,i|Xe.i, Yg^i) and p{xe,i,ye.i) — p{xe.i)p{ye.i\xe,i) can be made arbitrarily small as the 
blocklength grows without bound. In other words, there exists a sequence of codes over the bit pipe such 
that 



almost surely. (Here tt and p are vectors describing distributions (7r(a;e,i, ye,i|Xe.i, Yg i) : {xe,i,ye.i) G 
Xe X 3^e) and {p{xes,ye.i) ■ {xe,i,ye,i) S Xe X 3^e) respectively.) Although Theorem 3 in lfT4l only guarantees 
convergence of tt to p in probability, we can also prove almost sure convergence of tt to p using Borel-Cantelli 
Lemma. Let 7 = i? — /(Xe i; Ye.i)- Let Ye,i(Xe_i) denote the codeword in C^^' that is assigned to Xg^i by 
the emulation encoder For e > 0, define the error event 



Breaking the error event into two parts and then applying the union bound, Hoeffding's inequality, and the 
joint typicality lemma from ifTSl gives 



N—^co 



(11) 



TV 



={(Xe,l,Ye,l(Xe,l)) : ] | TT - p j | > 4 • 



P(f W) < P(Xe,l ^ Tl^'HXeA)) + [P((Xe,l, Ye,l[l]) ^ %^^HXeA,Yes)) 
< P{\7r{Xe\Xe,l)-p{Xe)\>piXe)e)+e-'"'''""'' 




< 2\Xe\e 



-~2Ne^ min p^(xe) 



2«(t-*(0) 



(12) 



+ e 
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where S{e) = e{H{Yes) + i?(Fe,i|^e,i)) ^ 0, as e ^ 0. Therefore, 

f:p(f(^))<oo, 



and hence ( fTTT i holds almost surely, by the Borel-Cantelli Lemma. 

We next combine the emulation code with the code for J\f_. The code emulates channel p{ye\xe) at time t ~ 1 
across the N layers of stacked network Af^ that replaces p{ye\xe) by a link of capacity R> C, only at time 
t — 1. The given code for can be run across Af^ with expected distortion bounded as 



E 



dL{U'',U^)\iXe,l,Ye,l) = (a;e,2/e) 



E[7r(a;e,ye|Xe,i, Ye,i)] 

dL(C/^,(7^)|(Xe,i,i;,i) = (a;e,2/e) (p(xe,2/e) + e) 



max- 

Thus we can replace the noisy link by a bit-pipe at time t = 1. We use induction to extend this result to the 
next n — 1 time steps. Note that in the original network 



E E 



di(c/^c/^)l(x^,l;") = (x;^2/:^) 



On the other hand, using the same analysis used in deriving in the iV-fold stacked network. 



(13) 



E 



E E 



dL{u\u'^)\{x:,Y:) = {x:,y:) 

xE[7r{x':,y:\X':,Y:)]. 



(14) 



Here X" = (Xe,i, Xe,2, • • ■ , Xe,„) and Y" = (Ye, i, Yg, 2, Ye.„) refer to the inputs and outputs of channel 
e in the N layers of the stacked network, for times t = 1,2, ... ,n, while X"(£) and Y"(£) correspond to the 
inputs and outputs of the emulated channel at layer i for times t ~ 1,2, . . . ,n, and 

\{£:iXne),Yne)) = ix",,y:^)}\ 



nix:,y:\x:,Y:) 



N 



October 18, 2011 



DRAFT 



11 



Therefore, we need to show that by appropriate coding over the bit-pipes, 

ip {{x:,y:) = {x:,yi^)) 7r(x:,y:\x:,Y:)\ ns) 

can be made arbitrarily small. Note that 

n 

l[P {{X,,t,Y,,t) = {xe,uye,t)\iXl-\Yt') = {xl~\yl-')) , (16) 



and 



where for t = 1 



^(x*-\y*-i|X*-i,Y*-i) = l. 



We have already proven that we can make the first term in the product in (T7\ converge to the first term in 
the product in ( fTSI l with probability one. We next prove by induction that the same result is true for each 
subsequent term in ( fTSI l and ([TtI i. Since all of the terms in ( [TtI i are positive and upper-bounded by 1, so too 
is their product. Thus, the Dominated Convergence Theorem (see, for example, 1161 ) shows that ( fTSI l can be 
made arbitrarily small provided that each term converges almost surely. 

To apply induction, assume that there exist t — 1 emulation codes whose application makes the first t — 1 
terms in ([TtT i each converge to the corresponding term in ( fT6] l almost surely. Using this inductive hypothesis, 
we prove that the i*'' term in dTTI i converges to the i*'' term in (fTSI l as well. 
Given the inductive hypothesis that 



^(x*',y*'|X*',Y*') 



p{xe,t',ye,t'\x'^-\ye~') (18) 



7T{xi:-\yt:-'\x':-\Yt:-') 

almost surely, for all (x* , y* ) and all t' < t — 1, it follows that 

^(x*-\y*-^|X*-\ Y*-i) (19) 

almost surely, for all {x*~^ ,y*~^). Since the two networks apply precisely the same deterministic code to the 
channel outputs at time t — 1 to create the channel inputs at time t, this bound implies 

TT{xi,yl-'\XlYl-')^pixlyi-') (20) 

almost surely, for all as well. We now show that if the emulation code used at time t is generated 
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independently of the codes used at times 1, 2, — 1, then for each (x* , y*), 

7:{xlyl\X.l,Yl) ^ 

almost surely, where p{y^^t\xe.t) = P(^e = ye,t\Xe = x^t)- Note that 



(21) 



P(Ye,*(l) = ye.t|(X*(l), Y*-^(l)) = {xlvl-')) 

P(Ym(1) = ye,*,X,,,(2 : N) = |(X*(1), Y*-i(l)) = {xWe-^)) 

J2 P(Xe,t(2 : N) = .^|(X*(1), Y*-i(l)) = 

X P(Y,,,(1) = ye,*|X,,, = s^, (X*-i(l), Y*-i(l)) = 
^ P(Xe,*(2 : TV) = .^|(X*(1), Y*-i(l)) = 

xP(Y,,t(l) = ye,t|Xe,t=s^), 



(22) 



where the last equality holds because (X*~^, Y*~^) — > Xe,t — ?• Ye t since the emulation code maps Xe,t to 
Ye t independently of all prior channel inputs and outputs. 

Since each network layer independently operates an identical code, and codewords in the emulation codebook 
are generated according to an i.i.d. distribution, it follows that 



P(Ye,t(l) = yeA^e,t - S^) = P(Ye,tW = ye,t|Xe,* = s'' ) 



for any £ such that sg — Xe,t under the operation of a random emulation code. Therefore, 



P(Ye,t(l) =ye,t|Xe,t =S^) 
1 



Nnixe^s^) 



NTT{Xe,t\s^) 



J2 P(Ye,t(£) =ye,t|Xe,t =5^^) 



= E 



= E 



N7r{xe,t\s^) ^ 



7r(Xe,t,ye,t|Xe,t, Ye,t) 



Xe,t — S 



N 



7r(Xe,t|Xe,t) 



X, 



„N 



i = S 



(23) 



By our inductive assumption and an argument similar to the one used in Remark 1, if e T}^\Xf,^t), for 
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N large enough 



E 



7r(Xe,t|Xe,t) 



P{Ve,t\^e,t) 



< e. 



Combining (|22|, (|23l) and (|23i, it follows that 



E 



P(Xe,t(2 : N) ^ 4^|(X*(1), Yr^(l)) = ixlyl')) 



s"eri"'(Xe,t):si=a:e 



xP(Ye,t(l)=ye,t|Xe,t=S^') 



E 



P(X,,,(2 : N) = ,sri(X*(l), Yr^(l)) = 



s"^ri'"(X,,t):si=Xe,i 

xP(Y,,t(l) = y,,4|Xe,t = s^) 

< b(2/e,*|2:e,0 + e)P (Xe,* € T ^ (X,,, ) I (X* (1) , Y*" ^ (1)) = {x\de~^)) 

+p(Xe.* ^ r,(^'(^e,*)i(x*(i),Yri(i)) = 

Similarly, 

P(Y,,,(l)^ye,*|(X*(l),Y*-i(l)) = (z*,2/*-i)) 

> b(ye,t|Xe.O - e)P (Xe,* G ) I (X* (1 ) , Y*" ^ (1 ) ) = {x\,v\-^\ 

But, if P((X*(l),Y*-i(l)) - ^ 0, then 

P (X,,, i rW(Xe,OI(X*(l), Y*-i(l)) = 

Pfx,,, i r/^)(X,,0,(X*(l),Y*-i(l)) = 



< 



P((X*(l),Y*-i(l)) = (x*,y*-i)) 





p(x,,, ^r/^'(Xe,t 



(24) 



(25) 



(26) 



(27) 



P((Xi(l),Y*-i(l)) = (xi,yri)) 

as ^ C50, and hence P(Xe,t £ re^^^(Xe,OI(X* (1), Y*-i(l)) = {x\,y\-^)) 1, as iV oo. Therefore, 
combining (|25] |. (|26] |. and (l27l) . it follows that, for each (a;*,!/*). 



P(Ye.t(l) =2/m|(X*(1),Y*-1(1)) = ^p(ye,t|xe.t), 



(28) 



almost surely, as N grows to infinity. 

This concludes the proof, because it shows that, for each £ e {1, 2, . . . , A^}, as the number of layers N grows, 
Ye_t(^) becomes independent of (X*^^(£), Y*^^(£)) conditioned on Xe,t(^), and its conditional distribution 
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Encoder 




Channel 




Decoder 













Fig. 1. Simple point-to-point channel 

converges to p{ye,t\xe,t) corresponding to the transition probability of channel e. 

■ 

Remark 1: The first part of the proof of Theorem |2] is not specific to DMCs. Instead, it shows that 2?(/t, A/f,) C 
T>{k^N) for all networks M of (discrete or continuous) point-to-point channels. 

VI. Continuity: zero-distortion versus lossless 

The distortion criteria for lossless source coding and lossy source coding with a distortion constraint of zero are 
different. In lossless coding, we require that the probability of error in reconstructing a vector of source symbols 
goes to zero as the blocklength of that vector grows without bound. In lossy coding, we require that the per symbol 
distortion between the source vector and its reconstruction approach zero for sufficiently long blocklengths. As a 
result, even under the Hamming distortion measure, distortion reconstructions do not necessarily meet the lossless 
source reconstruction criterion. Before investigating the relationship between these problems in a generic network 
M of the form defined in Section |III1 we consider some special cases where the relationship is known. Consider the 
simple point-to-point network shown in Fig. [1] Let the source U be i.i.d. and distributed according to p{u), and let 
C = maxp(^) I{X\ Y) denote the capacity of the point-to-point channel connecting the source and the destination. 
The minimal required rate for describing the source U at distortion D is ifTTl 

R{D) = min /([/; tl). 

p(u\u):F.[d(U,U)]<D 

In such point-to-point networks separation of source coding and channel coding is known to be optimal |[T|. Hence 
to describe the source at distortion D, we need C > kR{D). Evaluating R{D) at D — gives 

i?(0)= min I{U;U) ^ I{U;U) = H{U), 

p{u\u):E[d{LlU)]=^ 

where H{U) is the entropy rate of the source U . Since the minimal rate for lossless reconstruction of the source 
U is also the entropy rate, the zero-distortion and lossless reconstruction rate regions coincide in this simple 
network. Explicit characterizations of the multi-dimensional rate-distortion regions for general multiuser networks 
are unknown. Therefore, proving or disproving the equivalence of zero-distortion and lossless reconstruction rate- 
regions in such networks requires more elaborate analysis. In his Ph.D. thesis, W.H. Gu proved that in noiseless 
networks consisting of point-to-point bit-pipes, zero-distortion and lossless reconstruction rate regions coincide ifTSl . 

In this section, we prove the equivalence of zero-distortion reconstruction and lossless reconstruction in general 
networks described by multiuser discrete memoryless channels (mDMCs) with statistically dependent sources. More 
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precisely, we prove that in any mDMC with independent or dependent sources, lossless reconstruction is achievable if 
and only if zero-distortion reconstruction is achievable. Given any D G 'D{k,M), let C{D) = {(a, b) : D{a, b) = 0}. 

Theorem 3: Fix any non-negative matrix D = {D{a,b) : (a, 6) € V^) with \C{D)\ > 0. Then D e 'D{k.,M) 
if and only if there exists a sequence of codes at rate k with distortion E[(ii([/''^)'^, C/^'^'''-''^)] < D{a,b), for all 
(a, b) ^ C{D) and lossless reconstruction of source a at node b for all (a, b) e C{D). 
Proof of Theorem |5} 

For the forward results, fix a sequence of codes at rate k with distortion JL[dL{U^'^^'^ ,U''°^^^^'^)] < D{a,b) for 
all (a, 6) ^ /3(-D) and lossless reconstruction for all (a, 6) e C{D). For each (a, 6) e /^(i?) 

as L ^ CO, which implies that 

Hence, the given sequence of codes achieves zero-distortion reconstruction of source a at node b which gives the 
desired result. 

To prove the converse, fix any D E T){hl,M) with |£(-D)| > and any e > 0. By the definition of T){n,M), 
there exists a sequence of codes with source blocklength L and channel blocklength n = \nL\ such that 

E u^^^^^'L)-^ < D{a, b) + e, (29) 

for any (a, b) G and all L sufficiently large. Specifically, for any (a, b) such that D{a, b) ~ 0, and all L 
sufficiently large. 

We now prove that with an asymptotically negligible increase in rate k, node a can send node 6 sufficient 
information to improve node fe's reconstruction of node as data from a zero-distortion reproduction to a lossless 
reconstruction. We further show that this change preserves the quality of all other reconstructions. 

Fix L sufficiently large so that E[d(C/(°)'-^, i/^"^'')'^)] < D{a,b) + e, for all (a, 6). The following argument 
builds a code of source blocklength NL and channel blocklength Nn. 

Each node a G V breaks its incoming source block of length NL into N non-overlapping blocks of length L, 
given by 

TT{a),L Tj(a.).2L Tj{a),NL 

^ ,l^L+l , • ■ • , l^(Ar_i)i + i- 
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Each node then applies the code of blocklength L N times to independently code each of these blocks. In total, 
this requires Nn channel uses. Independently decoding each i-block with the blocklength-L decoder achieves, for 
each a, G V, a reconstruction of length NL such that 

mu[:%^^. u[Vi)L+i)] < D{a, b) + e, (30) 

for each 1^1,2,. ..,N. 

For (a, 6) G C,{D) and each f G {!,..., N}, denote the input of node a in session i as U^{£) ~ t/^^lf^'^^^i, and 
the corresponding output at node b as J7^(£) ~ U^^^^^£^^-^. By assumption, 

E[d(C/^(£),?7^(£))] < e. 

Thus 

e>E[d([/^W,t/^W)] 

i=l 
1 ^ 

>jY. '^-i" ^ (31) 

where d„iin — u)£UxU-u^u "^(^i Since all alphabets are assumed to be finite, and d{u, m) = if and only 

if u = M, o?inin > by assumption. Therefore, 

^ - T ^min 

for all i e {l,2,...,iV}. 

Recall that all sources and channels are memoryless by assumption and that the same code is used independently 
on each L-vector. Therefore, {U^{i), U^{()}f^i is an i.i.d. sequence. (See Fig. |2]) Our goal in the argument that 
follows is to losslessly describe C/^(2), . . . to a decoder that knows U^{1),U^{2), . . .. We treat this as 

a problem of lossless source coding with receiver side information, as shown in Fig. [3] From lfT9l . rate Rq = 
H{U^\U^) suffices for losslessly reconstructing at a receiver that knows U^. Here lossless coding means that 
p(^ljLN _^ jjLN-^ ^^jj made arbitrarily small, which is precisely the criterion needed for our proof. Using Fano's 
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inequality ifTTl . Jensen's inequality, and the concavity of the entropy function, 

L 

1=1 

L 

< J2[h{P{U, ^ C/O) + log \U\ P(C/, ^ U,)] 

i=l 

<Lh(^J2 P^^' ^ ^0 J + log \U\ J2 P(t^' ^ Ui 

\ 1=1 / i=l 



< Llh 



log 



= L/(e) (32) 

provided < 0.5, where for any < p < 1, h{p) = — plogp— (1 —p) log(l — p), and /(e) = + l2sMll_ 

Note that /(e) ^ as e ^ 0. 

We send the rate-i?o description of from node a to node b by treating the random mapping from to that 
results from applying the given code across the given network as a noisy channel. Specifically, we send N' dummy 
source vectors U^{N + 1), . . . , U^{N + N'), thereby creating A^' uses of a channel ]3(u^|u^) through which we 
can reliably transmit the lossless description of C/^(l), . . . , U^{N) to the decoder. The decoder's reconstructions 
C/^(l), . . . , U^{N) of source vectors J7^(l), . . . , U^{N) are treated as side information known only to the decoder. 

The following discussion describes the approach precisely and investigates its performance. The code used to 
losslessly describe [/^(l), . . . , U^{N) from node a to the node b employs fixed source values U^^'^-^{N + 1) = 
. . . = U'-'"'^'^{N + N') for all other nodes v ^ a in the network. The value transmitted by each node 

V G V\{a} is chosen as follows. 

Since distortion in non-negative by assumption, 

e > E[d(t/^,J7^)] 

> E[d(C/^,J7^)|c/(-'')-^ e r/^)]p(f/(-'^)^^ e r/^^), 

where [/(-'^).^ ^ (J/*''^'^)„GV\a- For any 5 > and all L large enough, P([/(-'»)'^ g T/^^) > 1 - 5, which 
implies that 

E[d([/^,;7^)|[/(-'')'^ e r/^^] < 

Hence, there exists £ ^s^'' such that 

EKC/^, [/^) I = u^-^-^-'^X < — i-. (33) 

1—0 

Fix any such u'^"''^. To bound the capacity of the resulting channel, we first bound the conditional entropy 
of given U^, when = m^^"^'^. Here, following steps similar to those in ( l3Tl i and ( l32b . but here 
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conditioning on = u*- we conclude that 

To finish our capacity calculation, we next bound the entropy of given [/("'''^^ = Since u^~°-^'^ g 'Tg^\ 

for any e 7;5'^'(;7|u(-")'^), 

by ifTSl . Hence, for L large enough, 

u^-eri^'((7|u(-»).i) 

> (l-5)2LiJ([/|[/(-'^)), 

where the last line follows since, for L large enough, ¥{U^ G Tg^^U'^^"-'''^ = m^"'')^'^) > I - 5. 
Hence, fixing [/("'^)"^ u'^~a),L yjgjjjg channel J/^""^'''" = u'-~°-^'^), with capacity 

Co > (1-<5)2lf(C/|C/(-'^))-L/(^). (34) 

Thus the rate required to losslessly describe U'"^ to a decoder with reproduction U^^ of [/^^ is at most RqN, 
and the capacity of the channel over which we wish to describe U^^ is at least Co bits per L network uses. We 
can therefore achieve the desired lossless description provided that A^'Co > NRq, giving N' > NRq/Cq. Thus the 
total number of sessions required to send first the lossy description and then the lossless incremental description is 

N + N' > N{1 + Rq/Co). Here 

Ro < Lf{e) 



Co - {l-SrLH{U\U(-))Lf{j^^) 

m 



(l-<5)2i/(C/|[/(-))-/(^)' 
which approaches zero as e approaches zero and S approaches zero. The resulting coding rate is 



l + i?o/Co' 

which approaches k as e approaches zero and 6 approaches zero. 

If 1/3(1?) I > 1, then, for each {a,b) E C{D), we can apply the same procedure to convert the zero-distortion 
reconstruction of source a at node b into a lossless reconstruction. Hence, for each source-destination pair in C{D), 
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Session 1 




► 


► 





Session 2 




► 





► 


Session N 


► 







Fig. 2. Source U and its reconstruction at L parallel sessions. 



we require a number of extra sessions. But, since < |Vp is a finite number, the resulting coding rate k', 

after adding these extra sessions, still approaches to k, as e and S corresponding to each (a, b) E C{D) converge 
to zero. 



Enc. 


i?o 


Dec. 





Fig. 3. Slepian-Wolf coding for converting zero-distortion reconstruction into lossless reconstruction. 



Combining Theorem[3] Theorem|2]and the result proved by W. Gu in ifTSl proves the separation of source-network 
coding and channel coding in a wireline network with dependent sources with lossy or lossless reconstructions. 
In particular, this result partially extends the separation result of 161 to the case where the sources are dependent. 
The extension is partial since in ||6l the channels can be discrete or continuous, but here we have only considered 
discrete channels. In the next section, we consider the case of AWGN channels. 

Vll. Continuous channels 

While the capacity results of ||6| are proven for general (discrete or continuous) alphabets, the sources and 
channels considered in Theorems [T] and |2] were all assumed to have finite alphabets. In this section, we prove that 
our results also hold for AWGN channels. In order to prove this we use the discretization method introduced in 
EQ). 
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Consider a wireline network M with an AWGN channel from node a to node h. Let the input and output of 
this channel be X and Y = X + Z, respectively. Assume input power constraint P and noise power N . Let Mb 
be a wireline network that is identical to network M except that the channel from a to 6 is replaced by a bit pipe 
of capacity C = 0.51og(l + P/N). Theorem 2] shows, as in the case of discrete-valued channels, that this change 
does not affect the set of achievable distortions, thereby generaUzing Theorem |2l 

In this section, for a technical reason which is made clear in the proof, we restrict the joint source-channel codes 
to the set of admissible codes which are defined as follows. 

Definition 3 (Admissible codes): Consider a joint source-channel code of source blocklength L and channel 
blocklength n for network Af. For each i = 1, 2, . . . , n, define 

=EML(C/('^'•^C>('-^''•^)|(^t,rO = (xt,yt)]. 

The code is called admissible, if for every fixed xt e K,, and (a, b) e V^, S't''^''\xt , j/t) is a non-decreasing function 
of \yt - Xt\. 

In other words, a code is admissible if its performance does not improve, i.e., stays the same or deteriorates, as the 
absolute value of the Gaussian noise increases. Let 'Da{n,M) denote the set of distortion matrices that are achievable 
over network M using admissible codes. Here 'Da{K,J\f) C 'D{K,J\f); whether the two regions are identical remains 
an open problem for future work. 

Theorem 4: For a wireline network consisting of discrete or AWGN point-to-point channels, 

Va{nM) C V{K,Afb) C V{K,Af). 

Proof of Theorem H)- The second inclusion is immediate since the first part of the proof of Theorem |2] 
applies equally well for continuous channels case. To prove the first inclusion, we employ the discretization method 
used in 1201 . Let network M'^^'^\ with j = {ji,j2, ■ ■ ■ ,jn) and k = {ki,k2, . . . , fc„), denote the network derived 
from network JV by replacing the AWGN channel from a to 5 by the structure shown in Fig. |4] The given 
channel relies on a pair of quantizers Q[j] and Q[k] parametrized by indices j and k. We allow the quantizer 
parameters to vary with t, setting j = jt and k ~ kt for each time t e {1,2, . . . ,ti}. The quantizer Q[i] is 
defined as follows. For i e {1,2,...}, let A — l/Vi, and define the quantizer Q[i] with quantization levels 
Ci = { — tA, — (t — 1)A, . . . , —A, 0, A, . . . , (i — 1)A, lA}. For any a; e M, Q[i] maps x to [x]i, which is the closest 
number to x in Ci such that |[a;]i| < x. Note that by this definition, E[[X]|] < E[X^] for any random variable X. 

Lemma [U in Appendix A shows that as j and k increase, the set of achievable distortions on A/^'j '') approaches 
the set of achievable distortions on the original network. More precisely, 

VaiK,Af) C limsupX>(K,7V(j'k)), (35) 
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where 

liinsup^j,k= Pi IJ A^k, 

^'^ jo,ko j>jo 

k>ko 

and A denotes the closure of the set A. 
We next show that 

V{K,J\f^^'^'>) <ZV{K,J^b)- (36) 

This is sufficient to obtain the desired resuh since ( [35l l and (|36] l together imply Dai^,-^) Q 'D{K,J\fi,) by the 
closure in the definition of 'D{K,J\fb)- 

To prove that 2?(k, A/'^j'''-') C 'D{K,Nb), note that, for each time t the structure shown in Fig. |4]behaves like a 
DMC with input [X]j^ and output [l^J/tf . Hence, by straightforward extension of the proof of Theorem |2] 

where NI^'^'^ is identical to M^'^'^^ except that the channel from a to & is replaced by a bit pipe of capacity Cj^k 
equal to the maximum capacity of the n DMCs. Here 

Cj,k = max max /([XJ^^ ; [Fjjfct )• 

l<t<npx:[X]3t~Px 

By the data processing inequality ifTTl . 

= h{Y,,)-h{Z). 
On the other hand, by the construction of the quantizers, 

< E[X^] + N. 

Hence, 

h{YjJ < 0.51og(27re(P + iV)), 

and as a result 

Therefore, 2?(K,7V(j''')) C Vt- 
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Q[k] 
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Fig. 4. Quantizing the input and output alphabets of an AWGN 



VIII. Conclusions 

In this paper we proved the separation of source-network and channel coding in general wireline networks of 
independent discrete point-to-point channels with dependent sources and arbitrary lossy or lossless reconstruction 
demands. We also proved that the result continues to hold when one or more channels is an AWGN channel and 
we restrict our attention to admissible codes. 

Appendix A: Lemma[T] 

Lemma 1: For any k > 0, 



2?a(«:,AA) C limsupI?(K,A/'(j'k)), (A-1) 

where A denotes the closure of set A. 

Proof: Let D E Vain^M). For any e > 0, and for L sufficiently large, there exist an admissible joint source- 
channel code at rate k with source blocklength L such that 

holds for each (a, h) G V^. Let = U'^"^'^ and tj^ = U<^''^'').l f^j. ^^^^ f^^^^^ ^ 

Conditioning the expected average distortion between and on the input and output values of the AWGN 
channel at time i = 1, it follows that 

D{a,b) + e> E[d(C/^, f7^)] 

= J2 P(a;i,yi)EKC/^,;7^)|(Xi,ri) = (a;i,yi)] 

ixi,yi) 

= E[5(i)(Xi,ri)] (A-3) 
where 5^'\xi,yi) ^ EKC/^, ;7^)|(Xi, Fi) = (x^yi)]. 

Now assume that the same code is applied to network A/^^^^^'^i), which is identical to except that at time t = 1, 
the AWGN channel is replaced by the structure shown in Fig. |4] with parameters j ~ ji and k = ki. The expected 
average distortion between and in the modified network, D^^^'''^\a,b), can be written as 

i)0-i.'=i)(a,6)^E[,5(i)(Xi,fi)], (A-4) 
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where Yi = [[X]j-^ + Zi\k^. Note that, conditioned on the input and output values of the AWGN channel at time 
t = 1, the two networks have identical performance. 

Further, Yi converges pointwise to Yi almost everywhere as j and ki grow without bound, i.e.. 



lim lim Yi = Yi, 



(A-5) 



almost everywhere, where Yi = X + Z. 
By the law of iterated expectations. 



lim lim £l(ji''=i)(a,6) 
= lim lim E[(5(i'(Xi,fi)] 



lim lim E 

fci — >oo ji — >oo 



E[(5«(Xi,fi)|Xi] 



= E 



lim lim E[(5(i)(Xi,Yi)|Xi 

fcl— >00 ji— >CXD 



(A-6) 



where the last line follows from the bounded convergence theorem. Thus far we have proved that Yi converges 
to Yi almost everywhere. In order to finish this part of the proof, we need to show that for each Xi = xi, 
5^^\xi, Yi) converges almost surely to 5'^^\xi, Yi) as well. For each fixed Xi ~ xi, let A^^u) (a^i) denote the set 
of discontinuity points of 5'^^\xi,Yi) in Yi. By our assumption, the code is an admissible code. Therefore, for a 
fixed xi, 5'''^^{xi, yi) is an increasing function of \yi — xi\. Hence, its number of discontinuity points is countable 
ED, Ea, and 

P(Yi =xi) 

= P(Zi e Mni){xi) -~ xi\Xi = xi) 

= P(Zi e Msii) {xi) - .Ti) = 0. (A-7) 
Therefore, 5^^\xi,Yi) converges almost surely to 5^^\xi,Yi). Hence, again by the bounded convergence theorem, 

lim lim D^^^^^^\a,h) 

ki --¥oo jl — >-CX3 



lim lim E[5^^'>{Xi,Yi)\Xi] 

k-i-^oo jl — >-oo 



E 



lim lim <5(i)(Xi,Yi) 

k-i—^oo jl — >oo 



E[<5(i)(Xi,Yi)]. 



(A-8) 



The prior analysis captures the expected distortion when the continuous channel is replaced by a finite alphabet 
channel only at time 1. To finish the proof we use induction. Assume that for times 1, 2, . . . , t — 1, the continuous 
channel can be replaced a finite alphabet channel without, asymptotically, changing the expected average distortion. 
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I.e., 



lim lim ... lim lini D^-^ \a,b) 



E[j(*-i)(x*-\y*-i)], 

D{a,b) 



(A-9) 



where D^-' 



, b) denotes the expected average distortion between and in the modified network 



when the parameters of the channel input and output quantizers, at times r e {1, . . . , t — 1}, are (j* ^, fc* ^), and 
let 

Now we need to show that if we add the quantizers at time t as well, the performance does not change. 
In the original network 



and in the modified network. 



= E[E[(5(*)(x*,y*)]|(x*,y*)]j , 



(A-10) 



E 



E 



(A-11) 



where for t' E {1, . . . ,t}, Xfi is the channel input at time t' when the given code is applied and the Gaussian 
channel replaced by its quantized approximation, and Yf = [[Xt']jf, + Zt']k^,- Note that Xi ~ Xi. 

While Xt and Xt might have different distributions due to the quantizations at times t' = 1, ... ,t — 1, their 
conditional distributions given the inputs and outputs of the channel up to time t — 1 are identical in both networks, 
i.e.. 



P(Xt<Xt 



Let 



iX''\Y'-^) = ix'-\y'-') 



P{Xt<Xt\{X*-\Y*-^)^{x*-\y*-')). 



(A-12) 



E 



S^'\x\y'-\Yt)dFixt\ix'-\y'-')) 



(A-13) 



October 18, 2011 



DRAFT 



25 



and 



E 



S^'\x\y'-\Yt)dF{xt\ix'~\y'-')) 



where in the last line we are using ( IA-121 ). 

Using the same argument as the one used to prove dA-Sb . it follows that 

lim lim 7(j'*''=*)(x*-\2/*-i) =7(a;*-\y*-i). 

fct— >-oo it— ^oo 

Hence, 



(A-14) 



(A-15) 



lim lim ... lim lim D^'*^^'\a,h) 

ki-^OQ ji — ^oo kt — >-oo jt—yoo 

= lim lim ... lim lim E[^'-^'^'''^ (X*''^ ,Y^-^)] 

ki — >-oo ji — ^oo k-t^QO jt^oc 



(a) 



lim lim . . . lim lim E 
lim lim . . . lim lim E 

fci — )-oo ji — )-oo fct- 1 — >oo jt- 1 — )-oo 



lim lim 7(i'.fc')(l*-i,f*-i) 



= Dia,b), 



(A-16) 



where (a) follows from (IA-15b plus the dominated convergence theorem, and (b) follows from our inductive 
hypothesis. 
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